Indexing The World Wide Web: The Journey So Far

نویسندگان

  • Abhishek Das
  • Ankit Jain
چکیده

In this chapter, we describe the key indexing components of today’s web search engines. As the World Wide Web has grown, the systems and methods for indexing have changed significantly. We present the data structures used, the features extracted, the infrastructure needed, and the options available for designing a brand new search engine. We highlight techniques that improve relevance of results, discuss trade-offs to best utilize machine resources, and cover distributed processing concept in this context. In particular, we delve into the topics of indexing phrases instead of terms, storage in memory vs. on disk, and data partitioning. We will finish with some thoughts on information organization for the newly emerging data-forms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weaving a Web of linked resources

This editorial introduces the special issue based on the best papers from ESWC 2015. And since ESWC’15 marked 15 years of Semantic Web research, we extended this editorial to a position paper that reflects the path that we, as a community, traveled so far with the goal of transforming the Web of Pages to a Web of Resources. We discuss some of the key challenges, research topics and trends addre...

متن کامل

Suitability of Signature Indexing Over the World Wide Web

Signature indexing has been studied extensively in text database or other databases for many years. The main advantages of a signature le as an access index are its small size, distributability, the ability to index information of a wide variety of types, ease of maintenance, and the ability to provide fuzzy indexing. These features are precisely what are needed for a good access index for inde...

متن کامل

Context Based Indexing On Synonym System Using Hierarchical Clustering In Web Mining

Now a days, the World Wide Web is the collection of large amount of information which is increasing day by day. For this increasing amount of information, there is a need for efficient and effective indexing structure. Indexing in search engines has become the major issue for improving the performance of Web search engines, so that the most relevant web documents are retrieved in minimum possib...

متن کامل

Search Engine using Apache Lucene

The World-Wide Web is a huge network of billions of workstations and this network contains billions of web pages containing information on a wide variety of topics. There are a lot of topics discussed by people, opinions and suggestions shared on various social networking sites that the users are interested in. Low precision and low recall still exists in the current search engines. So a search...

متن کامل

A Unified Approach to Indexing Multimedia on the Web

Indexing multimedia Web documents can be regarded as an important part of Web engineering, a concept first proposed [19] by one of the authors and his collaborators in 1998 at the World Wide Web WWW7 conference in Brisbane, Australia. Contentbased indexing of multimedia has always been a challenging task. The enormity and diversity of the multimedia content on the World Wide Web (WWW) adds anot...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010